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Abstract 

A brief overview is provided of cosinor-based tecliniques for tlie analysis of time 
series in chronobiology. Conceived as a regression problem, the method is applicable 
to non-equidistant data, a major advantage. Another dividend is the feasibility of 
deriving confidence intervals for parameters of rhythmic components of known 
periods, readily drawn from the least squares procedure, stressing the importance of 
prior (external) information. Originally developed for the analysis of short and sparse 
data series, the extended cosinor has been further developed for the analysis of long 
time series, focusing both on rhythm detection and parameter estimation. Attention 
is given to the assumptions underlying the use of the cosinor and ways to determine 
whether they are satisfied. In particular, ways of dealing with non-stationary data 
are presented. Examples illustrate the use of the different cosinor-based methods, 
extending their application from the study of circadian rhythms to the mapping of 
broad time structures (chronomes). 

Keywords: Chronobiology, Chronome, Circadian, Cosinor, External information. 
Regression, Rhythm parameters, Stationarity 



Introduction 

Non-random variations are found as a function of time at the cellular level, in tissue cul- 
ture, as well as in multi-cellular organisms at different levels of physiologic organization 
[1]. Multi-frequency rhythms usually account for a sizeable portion of the variability [2]. 
While there is presently much interest in studying circadian rhythms, the biological time 
structure covers many different ranges of periods beyond the 24-hour day, from fractions 
of seconds in single neurons to seconds in the cardiac and respiratory cycles, and a few 
hours in certain endocrine functions. Cycles with periods of about a week, about a 
month, and about a year are also ubiquitous, as are some other newly discovered cycles 
with periods of about 5 and 16 months, and much longer periods [3]. 

The partly built-in nature of circadian rhythms [4,5] is now widely accepted, as is the 
fact that they are amenable to synchronization by cycles in the environment (e.g., light- 
ing and feeding schedules) [6]. More generally, environmental geophysical cycles such 
as the day-light cycle, the tides, the phases of the moon, the seasons, as well as a host 
of other cycles shared between living organisms and the environment in which they 
evolved, all serve as synchronizers for partly endogenous rhythms [7,8]. 

The application of chronobiology and its concepts to biology and medicine depends 
upon the quantitative evaluation of data collected as a function of time. The inclusion 
of time as a primordial factor in chronobiological investigations broadens the scope of 
methods for data analysis. The methods presented herein serve the purposes of rhythm 
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detection and parameter estimation, with applications in the early diagnosis of altered 
rhythm characteristics indicative of a heightened risk, the optimization of treatment 
by timing, and a wider understanding of how our physiology is influenced by our 
environment. 

Data collection and study design 

Before turning to the methodology itself, it is important to consider aspects of data col- 
lection and study design [9] bearing on the choice of analytical tools used for data ana- 
lysis. Biological data (Yi, i = 1, 2, N) are typically obtained by having a clock trigger 
the system (instrument, sensor) to measure a biological variable, yielding a set of data 
at discrete sampling times (ti, i = 1, 2, N). Whether the variable examined is discrete 
(e.g., mitotic counts) or continuous (e.g., oral temperature), the numerical values attrib- 
uted to Yi are limited in accuracy and precision by the instrumentation used. Any finite 
variation of a biological system takes place during a non-zero time interval rather than 
instantaneously. In terms of data analysis, this means that there is a cut-off frequency fs 
beyond which the spectrum of the biological variations is practically zero [10]. Whether 
the transducer used to measure a given biological variable is analog or digital, it takes a 
certain time for it to respond and deliver a reading, so that rhythms with periods 
shorter than this response time cannot be assessed, and rhythms with a period close to 
it will be distorted [10]. In other words, a cut-off frequency fx can be defined as the 
minimal frequency such that for f > fx the output signal remains practically constant 
(no variation in the data can be assessed). This means that too dense measurements 
are redundant and do not bring additional information. In the case of equidistant data 
obtained at At intervals, it has been recommended to choose At < l/4fT to assure a 
good approximation of Y(t) [11]. 

For chronobiological applications, this sampling requirement (to be able to recon- 
struct changes as a function of time in the context of the theory of signal processing) is 
often difficult to meet. Instead, sampling is used in its statistical meaning, where it 
refers to the selection of a few items from a population to draw inferences for that 
population. In terms of a data series, the selection of a sampling interval At > l/fx al- 
lows only some features of the biological variations to be assessed. In the absence of ex- 
ternal information, for data collected over an observation span T, only oscillations with 
periods in the range of T up to 2At can be assessed. The resolution with which a sig- 
nals period can be determined also depends on T: in frequency terms, the smallest 
difference in frequency between two distinct signals is 1/T. The highest (no longer as- 
sessable) frequency, 1/2 At, is called the Nyquist frequency, fN. Within the field of infor- 
mation theory, this is known as the Nyquist-Shannon theorem, which states that if a 
function Y(t) contains no frequencies higher than f^, it is completely determined by 
sampling Y(t) at intervals of l/2fN. 

Ostle [12] defines the design of an experiment as the complete sequence of steps 
taken ahead of time to ensure that the appropriate data are obtained in a way which 
allows for an objective analysis leading to valid inferences with respect to the stated 
problem. These steps include the statement of objectives, the formulation of hypoth- 
eses, and the choice of design and experimental procedure and of the statistical 
methods to be used. Principles underlying experimental designs rely on replication, 
randomization and control. Replication relates to repeated measurements to obtain an 
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estimate of uncertainty (experimental error or noise) used to derive statistical signifi- 
cance (P-values) and confidence intervals. Noise originates from variations in the bio- 
logical system considered not to be part of the deterministic portion of the signal, from 
errors external to the system (errors of experimentation, of observation, and/or of 
measurement), and from the transducer and sampler (instrumentation error). Reducing 
the experimental error increases the precision of experiments. Randomization is an 
important aspect of study design that allows researchers to proceed as though the as- 
sumption of independence of the observation errors is true, which is critical in applying 
a test of significance. Although randomization cannot guarantee independence, it re- 
duces the correlation that tend to characterize errors associated with experimental 
material (experimental unit or data) adjacent in space or time, while also improving ac- 
curacy. Control relates to the amount of balancing, blocking and grouping of the ex- 
perimental units [12]. 

The number of replications needed for a given probability of detecting a given differ- 
ence with statistical significance depends on the standard error per experimental unit 
[13]. This means that small sample sizes can easily detect large differences, whereas 
small differences require larger sample sizes. When dealing with rhythmic variables, a 
sizeable portion of the variance stems from the rhythmic variation. Assessing the 
rhythmic behavior is thus important to reduce the error term. One important feature 
of chronobiological study designs is that rhythm stage is often the primary factor, as 
when assessing the relative efficacy or toxicity of a given treatment administered at dif- 
ferent stages of the circadian rhythm. The power of testing for a time effect is usually 
only slightly affected by the number of timepoints considered when results are ana- 
lyzed by cosinor, but not when performing an analysis of variance. This difference in 
approach accounts in part for the controversy between classical designs advocating 
fewer test groups [14] and chronobiological designs recommending at least 6 time- 
points per cycle [15-17]. 

In the framework of chronobiological study designs, three kinds of data can be distin- 
guished, which determine the choice of method for their analysis and how the results 
can be interpreted. Longitudinal sampling corresponds to obtaining data on the same 
individual (experimental unit) as a function of time. One example is the around-the- 
clock monitoring of blood pressure at about 30-minute intervals for 7 days. Results 
apply to this particular individual. Transverse (cross-sectional) sampling consists of 
obtaining only one value per individual (experimental unit), different individuals pro- 
viding data at the same or different sampling times. Time series of survival times are 
one example of transverse data. When individuals represent a random sample of a 
given population, results can be generalized for that population. Hybrid (linked cross - 
sectional) sampling consists of taking a few serial measurements from several individ- 
uals (experimental units). For instance, circulating prolactin is determined at 20-minute 
intervals for 24 hours in women at low or high familial risk of developing breast cancer 
later in life. The circadian rhythm can be determined for each woman and summarized 
across all women in each group for assessing any difference as a function of breast can- 
cer risk [18]. When individuals represent a random sample of their respective popula- 
tions, results can be generalized to these populations. 

Quite generally, but particularly when sampling is performed on more than a single 
individual, it is important that they are synchronized. Synchronizers (environmental 
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periodicities determining the temporal placement of biological rhythms) serve this 
purpose. The rest-activity schedule or the light-dark regimen are effective synchro- 
nizers and can be used to determine a reference time (such as time of awakening or 
light onset in preference to clock hours such as local midnight). Staggered lighting reg- 
imens have been used to facilitate data collection in the experimental laboratory [19], 
making it possible to collect data over several days [20]. Marker rhythms [21] are a 
useful check of whether synchronization has been achieved, further providing an 
internal reference. Activity, temperature, heart rate and blood pressure are some useful 
marker variables that can easily be monitored longitudinally. For instance, blood 
pressure has been used to guide the timing of administration of anti-hypertensive 
medication while also providing information regarding the patients response to 
treatment [22]. 

Summary statistics 

Before proceeding with any data analysis, it is recommended to first plot the data 
as a function of time. Such a chronogram can be informative in several ways. The 
presence of obvious rhythmicity may be recognized and its relative prominence as 
compared to the noise may be qualitatively (macroscopically) assessed. When sam- 
pling covers several cycles, some measure of the cycle-to-cycle variability can be 
gained. The presence of any increasing or decreasing trends can be observed, as is 
the existence of any outliers. After curve fitting, a chronogram of residuals can also 
provide valuable information regarding the adequacy of the model, and the need 
for data transformation. 

A histogram should also be prepared to obtain an estimate of the mean value with its 
standard deviation, and to check on the assumption of normality. For instance, a long- 
tailed distribution is indicative of the need for data transformation. Alternatively, the 
use of robust methods (such as those based on ranks; [23]) may be indicated. 

When prior information suggests the presence of a rhythm with known period, stack- 
ing the data over a single cycle reduces the noise and reveals the rhythm s waveform. 
Historically, this approach was used by Franz Halberg to resolve confusing variability in 
blood eosinophil counts [24-26]. It was also instrumental in showing that the circadian 
rhythm in core temperature of Fischer rats persisted after the bilateral lesioning of the 
suprachiasmatic nuclei, albeit with a reduced amplitude and a phase advance [27,28]. It 
should be noted, however, that stacking the data over an assumed period may yield 
spurious results if the signals period differs from its assumed value. For this reason, it 
is highly recommended to analyze the original data first before proceeding with any 
stacking. For instance, it is not uncommon to present data by calendar month, even 
when the data have been collected over several years. This procedure limits the ability 
to resolve any periodic signal with a period different from precisely 1 year or its 
harmonic terms (6, 4, 3 months, ...). Stacking the data over a period that has been vali- 
dated can be complemented by an analysis of variance testing for a time effect when 
data are binned into a given number of classes of equal duration covering the full cycle 
(e.g., six 4-hourly classes covering 24 hours). An F-test then serves for testing the 
equality of class means. While this procedure remains applicable for non-equidistant 
data, the result depends on the choice of the number of classes used for binning and 
on the choice of the reference time [29] . 
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Single cosinor 

Historically, the single cosinor was developed to analyze short and sparse data series 
[2,30-32]. Periodograms and classical spectra originally used in chronobiology [33,34] 
required the data to be equidistant and to cover more than a single cycle. Whereas 
some spectral analysis techniques are now available to analyze non-equidistant data 
[35-37], algorithms available in most software packages remain limited to the case of 
equidistant data. 

Least squares procedures do not have this limitation. They are thus useful in curve- 
fitting problems, where it is desirable to obtain a functional form that best fits a given set 
of measurements. Although periodic regression presents its own limitations, being sensitive 
to outliers and not having any constraint to conserve the variance in the data, it possesses 
two important features: first, when data are equidistant, results at Fourier frequencies are 
identical to those of the discrete Fourier transform [38]; and second, it advantageously uses 
prior information. Thus, after the existence of ubiquitous circadian rhythms was demon- 
strated, it was possible to apply the single cosinor method in many experiments aimed at 
determining the times of highest efficacy and lowest toxicity in response to a variety of 
drugs and other stimuli by fitting a 24-hour cosine curve to 6 values, 4 hours apart, each 
value representing the number of experimental animals that survived a given intervention 
applied at one of the 6 timepoints when overall about 50% of the animals had died. These 
results led to the fields of chronopharmacy and chronotherapy [39-42]. 

Single-component cosinor 

Notably in studies of circadian rhythms, it is indeed possible to assume that the period 
is known, being synchronized to the external 24-hour cycle. The regression model for a 
single component can be written as 



where M is the MESOR (Midline Statistic Of Rhythm, a rhythm- adjusted mean), A is 
the amplitude (a measure of half the extent of predictable variation within a cycle), (|) is 
the acrophase (a measure of the time of overall high values recurring in each cycle), t 
is the period (duration of one cycle), and e(t) is the error term (Figure 1). 

When T can be assumed known, using well-known trigonometric angle sum identity, 
the model can be rewritten as 



p = Acos0;y = -Asin0;x = cos(2nt/T);z = sin(2nt/T). 

The principle underlying the least squares method is the minimization of the residual 
sum of squares (RSS), that is the sum of squared differences between measurements Yi 
(obtained at times ti, i = 1, 2, N) and the values estimated from the model at corre- 
sponding times 



Y(t) = M + Acos(2nt/T + 0) + e(t) 



(1) 



Y(t) = M + px + yz + e(t) 



(2) 



where 




(3) 



This approach is valid when all individual standard deviations are equal, as is often 
the case. 
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t Time ► 

Reference time 



Figure 1 Definition of rhythm characteristics. The MESOR is a rhythm-adjusted mean; the double 
amplitude (2A) is a measure of the extent of predictable change within a cycle; the acrophase is a measure 
of the timing of overall high values recurring in each cycle, expressed in (negative) degrees in relation to a 
reference time set to 0°, with 360° equated to the period; and the period is the duration of one cycle. 
© Halberg Chronobiology Center. 



Estimates for M, |3, and y are obtained by solving the normal equations, obtained by 
expressing that RSS is minimal when its first-order derivatives with respect to each 
parameter are zero. 

The normal equations are 



^Yi = MN + p^Xi+y^Zi 
^YiXi = M^Xi + f^Xi"^ + Y^XiZi 
^YiZi = M^Zi + p^XiZi + yXI^'^ 



(4) 



or in matrix form 



/ N ^x. ^z. ^^ 

IZ'^i' I^^iZi 





or d = Su 



(5) 



Estimates of M, (3 and y (or vectorially, u) are thus obtained as 
u = S"M 



(6) 



Estimates for the amplitude and acrophase can be derived from the estimates of p 
and y by the following relations 



A 
0 



\ 1/2 

arctan (^-y /y^^ + Ktt where K is an integer 



(7) 



The correct value of 0 is determined by taking into account the signs of ^ and f . 

For rhythm detection, the total sum of squares (TSS) is partitioned into the sum of 
squares due to the regression model (MSS) and the residual sum of squares (RSS). TSS 
is the sum of squared differences between the data and the arithmetic mean. MSS is 
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the sum of squared differences between the estimated values based on the fitted model 
and the arithmetic mean. As noted above, RSS is the sum of squared differences be- 
tween the data and the estimated values from the fitted model 

TSS = MSS + RSS or ^(Yi-Y)^ = ^(Yi-Y)^ + ^(Yi-Yi)^ (8) 

The model is statistically significant when the model sum of squares is large relative 
to the residual sum of squares, as determined by the F test 

F = (MSS/2)/(RSS/(N-3)) (9) 

where 2 and N-3 are the numbers of degrees of freedom attributed to the model (k = 3 
parameters - 1) and to the error term (N - k). The null hypothesis (Hq) that there is 
no rhythm (the amplitude is zero) is rejected when F > Fi_ct(2, N-3), where a relates to 
the chosen probability level for testing Hq. 

For parameter estimation, it seems reasonable to consider the MESOR (M) separately 
and (p, y) together. The 1-a confidence interval for M is then given by 



M±ti.,/2(N-3)(7^si\ (10) 

where s-^ are the elements of S'^, 

a = [RSS/(N-3)]'/' (11) 

and tp(f) denotes the pth probability point of Students t on f degrees of freedom. The 
covariance matrix for (piY^ is given by 



/ c 1 c-1 
2 / ^22 ^23 

=*32 ^33 



from which a 1-a confidence region for {Pif^^ or equivalently for (A, 0) can be derived. 
It is based on the F-statistic used for rhythm detection, being evaluated at the estimated 
value of A instead of at A = 0. The resulting equation is that of an ellipse (Figure 2): 

E(xrx)2 (^-^)' + 2E(xi-x)(zi-z) (^-^) (rf ) + E(zi-z)2(r)>)' < 2a^V,.a{2, N-3) 

(12) 

where 



X 
and 

The region delineated by this ellipse represents the confidence region for the rhythm 
parameters. Conservative confidence intervals for A and 0 are obtained by computing 
the minimal and maximal distances from the pole (zero) to the error ellipse and by 
drawing tangents from the pole to the error ellipse, respectively [31]. These confidence 
limits are conservative in the sense that the area they delineate is larger than the confi- 
dence ellipse. Limits closer to those corresponding to the a-level chosen for testing can 
also be computed [43]. 
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Figure 2 Single-component single cosinor: hypothesis testing and parameter estimation. A cosine 
curve with a given period is fitted to tine data (top) by least squares. Tliis approacli consists of minimizing 
tine sum of squared deviations between tine data and tine fitted cosine curve. Tine larger this residual sum of 
squares is, the greater the uncertainty of the estimated parameters is. This is illustrated by the elliptical 95% 
confidence region for the amplitude-acrophase pair (bottom). When the error ellipse does not cover the 
pole, the zero-amplitude (no-rhythm) test is rejected and the alternative hypothesis holds that a rhythm 
with the given period is present in the data (left). Conservative 95% confidence limits for the amplitude 
and acrophase can then be obtained by drawing concentric circles and radii tangent to the error ellipse, 
respectively. When the error ellipse covers the pole, the null hypothesis of no-rhythm (zero amplitude) is 
accepted (right). Results (P-value from the zero-amplitude test, percentage rhythm or proportion of the 
overall variance accounted for by the fitted model, MESOR ± SE, amplitude and 95% confidence limits, 
acrophase and 95% confidence limits) are listed in each case. © Halberg Chronobiology Center. 



Standard errors (SEs) for A and 0 can also be derived from the covariance matrix for 
^y^, f ^, by using Taylors series expansion: 



SE(A) = (7 [s22 cos^0 -2s23 ^^^^ COS0 + S33 sin^0] 
SE(0) = a [s22 sin^0 + 2S23 ^^^0 cos0 + S33 cos^0] ^/^/A 
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Regression diagnostic tests 

It should be noted that the P-value obtained from the F-test and the corresponding confi- 
dence limits derived for M,y^ and y are valid only if assumptions underlying the use of 
the least squares procedure are satisfied. These assumptions are (1) the model fits the data 
well, (2) the residuals are normally distributed, (3) the variance is homogeneous, (4) the 
residuals are independent, and (5) the parameters do not change over time. 
Model adequacy: 

Goodness of fit can be examined when replicates are available, either from multiple 
data collection at different timepoints or from data covering multiple cycles. RSS can 
then be further partitioned into the "pure error" and the "lack of fit". An F-test compar- 
ing the pure error and lack of fit sums of squares provides a test of the model adequacy 
[44]. The pure error sum of squares (SSPE) is defined as the sum of squared differences 
(across all timepoints) between the data collected at a given timepoint and their re- 
spective arithmetic mean, whereas the sum of squares ascribed to lack of fit (SSLOF) is 
obtained by subtracting SSPE from RSS 

SSLOF = RSS-SSPE , , 

with Yi = (^^^jYii^ / ni where ni is the number of data collected at time ti. 
The appropriateness of the model is rejected if 

F = [SSLOF/(m-l-2p)]/[SSPE/(N-m)] > Fi.^(m-l-2p, N-m) (15) 

where m is the number of timepoints and p is the number of (cosine) components in 
the model (p = 1 for the single-component cosinor). 

In the presence of lack of fit, adding components in the model may be considered 
(Figure 3). 

Normality of residuals: 

The rankit plot provides an attractive visual technique to test normality [44,45]. In 
this test, the errors (ei) are sorted by increasing order and expected values of a normal 
sample of size N with zero mean (rankits, z^) are calculated. If the residuals are nor- 
mally distributed, the regression of ei on Z[ is a straight line. The Shapiro-Wilk test of 
normality can be applied for small sample sizes (N< 50) [46]. For larger sample sizes, a 
chi-square test of goodness of fit can be used [23] by comparing expected and observed 
frequencies of residuals grouped in classes. 

Homogeneity of variance: 

The variance of the error term sometimes depends on the expected level of the vari- 
able examined. This can be the case of hormonal data such as melatonin that assumes 
large values by night but only very low values during the day. All day-time values being 
very small, the variance is also small. But as night-time data can vary greatly, their vari- 
ance is also inflated. Deviation from variance homogeneity can be revealed by a plot of 
residuals as a function of the fitted values [47]. A horizontal band around zero in such 
a plot indicates that the assumption is valid. If it is violated, data can be transformed, 
for instance by taking their square root or their logarithm. A numerical test can also be 
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Figure 3 Multiple-component single cosinor. Systolic blood pressure data collected over 7 days by a 


45-year old woman fitted with a 24-hour cosine curve indicate the presence of lack of fit, departure from 


normality of residuals, and inhomogeneity of variance (left). The addition of a 12-hour component (middle) to 


the model (right) yields a better fit for which underlying assumptions are validated. © Halberg Chronobiology 


Center. 









performed by fitting the model to the square of the estimated values instead of the data 
in order to obtain residuals rei. If 

F = (N-2p-2)rV(l-r2) > Fi.,(l, N-2p-2) (16) 

where r denotes the regression coefficient of rei on ei, the assumption of homogeneity 
of variance is rejected. 
Independence of residuals: 

While violation of independence usually does not affect the estimate of the parameter 
themselves, their confidence intervals tend to be under-estimated [48]. When residuals 
are positively correlated, they tend to assume the same sign for long sequences. The 
runs test is a non-parametric test that allows to test whether sequences (runs) of posi- 
tive and negative residuals occur randomly. Specifically, if successive errors are inde- 
pendent, there cannot be any regular sequences, either too long or too short. In other 
words, the number of runs cannot be too small or too large, respectively. For a given 
sample size, tables list limits for acceptable numbers of runs compatible with the as- 
sumption of independence [23,49,50]. 

When residuals are correlated, the data can either be low-passed filtered by averaging 
or decimation (using only one every k values, thereby lengthening the sampling interval 
from At to kAt). A slightly different model can also be considered wherein the error 
term is replaced by an autoregressive error term [10,51,52]. 

Stationarity: 

The problem of stationarity arises primarily in long time series, when the MESOR, 
amplitude, acrophase and/or period can change as a function of time. This may occur, 
for instance, when a person being monitored travels across time zones (e.g., from the 
USA to Europe). A head cold or pain can also bring about transient changes in circa- 
dian rhythm characteristics of variables such as body temperature, blood pressure or 
heart rate. Changes in period can be anticipated when time clues (environmental syn- 
chronizers) are removed. They are also present in the case of components with no 
strong environmental synchronizer. This concerns primarily non-photic cycles such as 
the about 1.3-year transyear and the about 5-month cis-half-year found in solar wind 
speed and solar flares, respectively. These components are wobbly by nature, even in 
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the environment. Counterparts in biology have been found, as discussed elsewhere 
[3,53-56]. There are several approaches available to analyze non-stationary time series, 
such as wavelets, short-term Fourier transforms, and gliding spectral windows comple- 
mented by chronobiologic serial sections, as discussed below. 

For short and sparse time series, for which the cosinor method was originally devel- 
oped, underlying assumptions are usually valid (or at least statistical power is not suffi- 
cient to invalidate them). With longer and denser data, the likelihood of violating one 
or several underlying assumptions increases. Violation of one assumption can also re- 
sult in violating one or several other assumptions. For instance, when there is lack of 
fit, residuals tend not to be independent and may not follow a normal distribution. 
Confidence intervals and P-values tend to be affected more than the estimation of pa- 
rameters. Even when one or several underlying assumptions are violated, the informa- 
tion from the cosinor analysis may be of value as long as results are properly qualified. 
Many of the conventional methods of data analysis depend on similar assumptions, 
with the exception of robust non-parametric techniques [10,57-61]. 

Multiple-component cosinor 

The single-component cosinor is easily extended to a multiple-component model 
(Figure 3) 

Y(t) = M + ^ Aj cos(2nt/Tj + 0^) + e(t),j = 1,2, ...,p (17) 

Instead of solving a system of 3 equations in 3 unknowns, there are 2p + 1 normal 
equations to estimate M and p pairs of (pj, yj) or (Aj, (|)j) when Tj are assumed known. 
Generally, in the normal equations d = Su 

d = ^ XiyYi and Su = ^.Uj^.XiyXij for i = 1, 2, N; (18) 
j = 1, 2, 2p + 1; andv = 1, 2, 2p + 1 

where {xij} are the cos(2nti/Tj) and sin(2nti/Tj). 
Estimates of u (M, |3i, yi, |32, y2> Pp, Yp) are obtained as 

u = S'^X'Y where S = X'X = |^.XijXik . 

A confidence ellipsoid can be determined [43] from which approximate confidence 
intervals can be derived for each components amplitude and acrophase, as outlined 
above. Computations are greatly simplified in the case of equidistant data covering an 
integer number of cycles [45] . 

A multiple-component model is useful to obtain a better approximation of the sig- 
nals waveform when it deviates from sinusoidality. For instance, a 2-component 
model consisting of cosine curves with periods of 24 and 12 hours has been exten- 
sively used to analyze ambulatory blood pressure data (Figure 3) [62-64]. On the 
average, these two components account for the larger overall variance in this case 
[65]. This model is usually well-suited to approximate the nightly drop in blood 
pressure that reaches a nadir around mid-sleep [66], the slight increase thereafter 
and a sharper increase after awakening, the post-prandial dip seen more prominently 
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in the elderly [67], and a slow decline in the evening. Whereas better-fitting models 
can be obtained for each individual patient, the choice of a given model used as a 
reference standard makes it possible to derive reference values (such as 90% predic- 
tion limits) for the models parameters for specified populations, usually clinically 
healthy men or women in several age groups [62], Deviations from these norms can 
then be viewed as indicative of rhythm alteration. In addition to the well-known car- 
diovascular disease risk associated with an elevated blood pressure MESOR, out- 
come studies [68-72] have determined that certain other rhythm alterations affecting 
the circadian amplitude and acrophase are also associated with an increase in car- 
diovascular disease risk [64]. 

Population-mean cosinor 

When data are collected as a function of time on 3 or more individuals, the 
population-mean cosinor procedure renders it possible to make inferences concerning 
a population rhythm, provided the k individuals represent a random sample from that 
population. Each individual series is analyzed by the single- or multiple-component 

single cosinor to yield estimates of |u = M-i.^^.^fy^.^^-.y^., .,,,^^.,Y^.,i = 1,2, ...,k|. 

The goal is to make inferences concerning the population averages of the parameters, 
u*. The indicates that the expectations are population averages and not averages 
over the k individuals sampled. Individual vectors Ui are assumed to represent a 
random sample from a (2p + l)-variate, normal population with mean u*. The within- 
individual variances are also assumed to be equal, so that the pooled estimate of vari- 
ance can be estimated as 



When the sample sizes for all individuals are the same or almost the same, as is often 
the case in hybrid designs, the population estimates are unweighted averages of the in- 
dividual parameters 



and the population amplitudes and acrophases can be estimated using the relations 



In the above conditions and assuming normality of errors and individual parameters, 
sample variances can be computed as 




(19) 




(20) 



y^* = A* COS0*; f * = -A* sin0* 




Zj(MpM*)V(k-l);a|„ = Zj06„j-^*)2/(k.i). 

= Zj(f„j-f *)V(k-l);a2i^i = Zj(Mi-M*)(^ij-^i*)/(k-l); 



(21) 



and similarly for the other cross-products 



where 



j = 1, 2, k and n = 1, 2, p 
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In the case when the population-mean cosinor can be applied separately for each trial 
period (p = 1), a confidence interval for M* is given by 

M*±ti.,/2(k-l)aM/k'/' (22) 

and a joint 1-a confidence ellipse for (^*, f *) consists of all points (|3/,y/) satisfying 



2 /-2 



(23) 



2(l-r2)(k-l)Fi.„(2,k-2)/(k(k-2)) 
where 

The null hypothesis of A* = 0 is rejected if 

[(k(k-2))/(2(k-l))] [l/(l-r2)] [^*yaj-2r^*f*/a^a, + y^'/al] > F,.„{2,k-2) (24) 

and approximate confidence intervals for A* and 0* can be obtained by computing the 
minimal and maximal distances from the pole (zero) to the error ellipse and by drawing 
tangents from the pole to the error ellipse, respectively (Figure 4). As for the single 
cosinor, closer approximate limits can also be computed [43]. 

Parameter tests 

Test statistics have been developed to test the equality of MESORs, amplitudes and 
acrophases considered jointly or separately for the case of the single cosinor and the 
population-mean cosinor [43]. These tests can allow for a clearer interpretation of the 
results, for instance in a circadian experiment involving 6 timepoints 4 hours apart: 
Student t- tests are sometimes applied at each separate timepoint without adjustment of 




J3= Acos<z^ 



7 = - Asin^ 

Figure 4 Elliptical confidence limits. The outer ellipse delineates the 95% confidence region for the joint 
estimation of the amplitude and acrophase (as a pair). Distances and tangents drawn from the pole to this 
outer ellipse yield conservative 95% confidence limits for the amplitude and acrophase considered 
separately as the area thus delineated is larger than the area of the outer ellipse. In order to obtain separate 
95% confidence limits for the amplitude and acrophase, distances and tangents need to be drawn to a 
somewhat smaller (inner) ellipse. For further details, see [43]. © Halberg Chronobiology Center. 
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the P-values for multiple testing; when differences in opposite directions are found, 
parameter tests may reveal a difference in the circadian amplitude in the absence of a 
difference in MESOR or in the circadian acrophase [73] . 

Least squares spectra and population-mean cosinor spectra 

The circadian rhythm is often prominent. It is also ubiquitous. These features enabled 
the single cosinor procedure to be applied to many short data series covering no more 
than a single cycle in order to yield valuable information regarding the organization of 
the circadian system in different species. Computationally, estimates of the MESOR, 
amplitude and acrophase can be obtained for any trial period. This procedure, however, 
is valid only if there is sufficient evidence for considering this particular trial period. In 
the absence of such evidence, results can no longer be taken at their face value. 

It has become much easier for chronobiologists to collect data over much longer 
spans and/or at much shorter intervals, but it has been more difficult to obtain series 
of equidistant data. Even for variables that are obtained with automated instrumenta- 
tion (such as telemetry or ambulatory blood pressure monitors), it is not uncommon to 
have missing data or to have additional data collected manually at times different from 
the scheduled times. Investigations have also extended outside the circadian realm. For 
these reasons, a least squares approach to time series analysis remains attractive, as 
long as caution is properly taken in interpreting the results. 

Just as a chronogram provides useful information prior to quantitative data analysis, 
a view of the time structure of the data in the frequency domain can also be inform- 
ative. For this purpose, using the cosinor at Fourier frequencies in the range of 1/T 
(where T is the length of the data series) up to 1/2 At (where At is the sampling inter- 
val) can be viewed as no more than another macroscopic view of the data. A plot of 
amplitudes as a function of frequency (least squares spectrum) is equivalent to a discrete 
Fourier transform when data are equidistant [38]. 

- Large spectral peaks indicate the presence of signals and provide an approximate 
estimate of their periods. This information can be used to validate anticipated 
components while also revealing the presence of other cycles. For rhythms that are 
anticipated, rhythm detection and parameter estimation can proceed as outlined 
above as long as P-values are adjusted for multiple testing [74]. Caution needs to be 
taken regarding non-anticipated cycles. The information thus gained can be used to 
design the next study or to examine other similar data series that could serve as 
replications. Additional analyses can be performed to determine the extent of 
stability of the unanticipated component, for instance by means of applying a 
chronobiologic serial section [21] or a gliding spectral window [75]. 

- Plotting log-amplitudes versus log-frequency provides useful information regarding 
the noise structure [65]. White noise corresponds to about the same background 
amplitudes across the entire frequency range. Larger background amplitudes at 
lower than at higher frequencies represents colored (or correlated) noise, indicating 
that underlying assumptions are not met, resulting in under-estimated P-values and 
too-liberal confidence intervals of rhythm parameters. The noise structure can in 
itself be valuable. It is used for instance in assessing the 1/f behavior of heart rate 
variability [76]. 
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- Single spectral peaks are found only if the data cover an integer number of cycles. If 
this is not the case, the signal spreads over several spectral lines [10]. When this 
happens and the underlying signal was anticipated, it is possible to determine the 
period (frequency) corresponding to the maximal amplitude by applying the single 
cosinor procedure not only at the Fourier frequencies but at additional 
intermediary frequencies as well. Whereas this may provide a clearer picture of the 
signal, it should be realized that the resolution in frequency (1/T) remains the 
same, being determined by the series length, T. Tapers such as a Manning window 
[77] can be used to reduce sidelobes associated with the finite observation span, but 
this procedure also affects the estimation of the rhythm parameters. While a 
Manning taper does not affect the location of spectral peaks in a spectrum, the 
width of the peak is wider and the amplitude is reduced (Figure 5). It remains 
useful, however, for a macroscopic view of the time structure of the data. 

Least squares spectra can be very helpful in exploratory analyses, but it should be re- 
alized that assumptions underlying the use of the single cosinor (notably independence 
and normality) are violated more often than not. Population-mean cosinor spectra are 
a useful complementary approach not prone to this limitation. This method is similar 
to the power spectrum obtained by smoothing the periodogram, which is more reliable 
for testing unknown periodicities [78], with the important difference, however, of retain- 
ing the phase information. The averaging (smoothing) can be done either in the frequency 
domain by averaging across consecutive Fourier frequencies, or in the time domain. The 
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Data tapered with Manning window 
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Figure 5 Effect of applying a Manning taper on the least squares spectrum. A simulated signal 
consisting of a fundamental and second harmonic of equal amplitudes sampled over 10 cycles (top left) 
is tapered with a Manning window (top right). Corresponding least squares spectra (bottom) indicate that 
while the spectral location of the two peaks remains the same, the amplitudes are reduced and the 
bandwidths are wider. Sidelobes are also greatly diminished. Simulation and original drawings from C. 
Lee-Gierke. © Halberg Chronobiology Center. 
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population-mean cosinor spectrum uses averaging in the time domain. The method 
consists of subdividing the observation span T into several (e.g., k) subsections (or inter- 
vals, I) of equal length (I = T/k). A least squares spectrum is computed for each interval, 
using the same common reference time. The population-mean cosinor procedure is then 
applied at each trial frequency to summarize results across the k intervals. The procedure 
can be repeated by using different values of k. Unknown signals consistently detected by 
this approach may thus be viewed with added confidence. 

Extended linear-nonlinear cosinor 

When the period is unknown, the single cosinor model (Equations 1 and 12) can no 
longer be linearized in its parameters as the period is in the argument of the cosine 
function. Starting from an initial (guess) estimate for the period, all parameters can be 
estimated using iterations aimed at minimizing the residual sum of squares. Marquardt 
[79] developed an algorithm which performs an optimum interpolation between the 
Taylor series and gradient methods. He also derived a way to approximate confidence 
intervals for all parameters, including the period [80]. For the particular case of single- 
component models, Bingham offers an easily understood approach [81]. 

For low-frequency signals, simulated annealing [82] is another suitable method that 
has the advantage of not requiring the specification of initial values for the periods. 
This approach does not perform well, however, for very sharp signals in the higher fre- 
quency range of the spectrum. Both simulated annealing and Marquardt s nonlinear ap- 
proach performed best in distinguishing two signals with close periods sampled over 
less than a beat cycle, when compared to other approaches [83]. 

For signals with a symmetrical waveform, the nonlinear procedure can yield an ac- 
ceptable estimate of the fundamental period on the basis of very short records not even 
covering a full cycle [84]. This is not the case, however, when the waveform is asym- 
metrical. Simulations indicate that about 5 cycles are needed to obtain a reasonable es- 
timate of the period in this case, when the model fitted includes only the fundamental 
component. Including additional harmonic terms in the model allows the nonlinear 
procedure to correctly estimate the fundamental period with data covering no more 
than 2 cycles [84]. 

Analysis of non-stationary data 

When data are equidistant or rendered equidistant by averaging and filling data gaps by 
interpolation, wavelets can be performed [85]. This approach has been useful to un- 
cover components not detected earlier [86]. Short-term Fourier transforms can be used 
to visualize changes in the spectral structure of the data as a function of time [87]. 
Alternatively, gliding spectral windows [75] can be computed. The method consists of 
defining an interval (I) that is progressively displaced by a given increment (5t) 
throughout the data series. A least squares spectrum is computed over each interval 
over a specified frequency range. Both a fractional harmonic increment (5h < 1) and 
overlapping intervals (5t < I) are chosen to help visualize the time course of changes in 
frequency and/or amplitude occurring as a function of time in a 3D graph and/or a sur- 
face chart. In such a display, time and frequency are the two horizontal axes and ampli- 
tude is shown on the vertical axis in a 3D plot or as different shadings in a surface 
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chart. One example relates to competing about 24.0- and 24.8-hour components coex- 
isting in the physiology of an apparently seleno-sensitive woman with adynamic epi- 
sodes recurring twice a year and lasting 2-3 months, as illustrated for systolic blood 
pressure in Figure 6. Another example illustrates the changing prominence of the 
about-weekly and about-daily rhythms in blood pressure and heart rate during the first 
40 days of life of a clinically healthy boy [88]. Whereas the procedure can be performed 
on non-equidistant data, the interpretation of results is greatly helped when data are 
equidistant, as changes in sampling rate are also associated with changes in spectral 
structure appearing on the graph. A judicious choice of I and of the frequency range 
examined is important in order to minimize sidelobes. The use of a Manning taper [77] 
is also helpful in this kind of exploratory analysis. 

Whenever focus can be placed on a specified component with a given trial period, a 
chronobiologic serial section [21] can be performed. As for the gliding spectral window, 
an interval I is selected that is progressively displaced throughout the time series by 5t 
increments. To data in each interval, the single-component single cosinor procedure is 
applied. To visualize the results, a chronogram is shown on top, followed by the se- 
quence of MESORs, amplitudes and acrophases as they change as a function of time, 
provided with a measure of uncertainty. Corresponding P-values from the zero- 
amplitude test and the number of data per interval are also displayed to help interpret 
any change in the results. This procedure has been extensively used in studies of phase 
shifts associated with transmeridian flights [89], as illustrated in Figure 7, and in cases 
when the circadian rhythm is desynchronized from 24 hours [90,91]. 
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Figure 6 Gliding spectral window (surface chart). Systolic blood pressure was automatically measured 
around the clock by a 62-year old woman with recurring episodes of adynamic depression occurring twice 
a year and lasting 2-3 months. Complementary nonlinear analyses (not shown) indicate the coexistence of 
about 24.0- and 24.8-hour components, their relative prominence alternating between wellness and adynamic 
depression. Changes in the most prominent circadian period as a function of time are apparent from the 
changes in amplitude (shading) and location along the vertical scale. © Halberg Chronobiology Center. 
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Figure 7 Chronobiologic serial section. Peak expiratory flow was self-measured several times a day by a 
53-year old man. The data covering a 14-month span are shown in row 1. They are analyzed in a 20-day 
interval progressively displaced by 2 days. Data in each interval are fitted with a 24-hour cosine curve. 
From the P-values shown in row 2, it can be seen that the circadian rhythm was detected with statistical 
significance most of the time, except for two short spans, one coinciding with a transmeridian flight (when 
fewer data were collected, row 5) and the other with influenza. Whereas the 24-hour acrophase remains 
relatively stable throughout the record (row 4), the MESOR (row 3, lower curve) and to a lesser extent the 
circadian amplitude (row 3, distance between the two curves) undergo sharp changes, notably in association 
with the influenza and earlier with a change in treatment timing. © Halberg Chronobiology Center. 



The procedure has been extended in two ways. First, a multiple-component instead 
of a single-component single cosinor model can be fitted in each interval. This proced- 
ure has been used for instance to illustrate that the prominence of an about 5-month 
component of heart rate self-measured over 4 decades by a clinically healthy man 
follows the about 11 -year change in solar flares in which this component had been doc- 
umented [92]. In this analysis, the 5-month component was fitted together with 1.0- 
and 0.5-year components over a 4-year interval displaced by 0.2-year increments [93]. 
Figure 8 illustrates the changing prominence of the three components as a function of 
time. An about 11-year cycle in the prominence of the about 5-month component is 
highlighted. Second, a nonlinear model can be fitted in each interval and the period 
displayed as a function of time with its 95% confidence interval. This approach was 



Cornelissen Theoretical Biology and Medical Modelling 2014, 1 1:16 
http://www.tbionned.conn/content/1 1/1/16 



Page 1 9 of 24 




0.0 I ^ 0 

12/31/1969 12/31/1977 12/31/1985 12/31/1993 12/31/2001 

Time (calendar date) 



Figure 8 Multiple-component serial section. Heart rate, self-measured a few times each day by a healthy 
man over several decades, was averaged over consecutive weeks. The weekly averages are fitted with a 
3-component model consisting of cosine curves with periods of 1.0, 0.5, and 0.41 year in a 4-year interval 
moved by 0.2 year throughout the entire record. The time course of amplitudes of the 3 components is 
shown on top. Solid-filled, light-filled and empty symbols correspond to P-values <0.05, 0.05 < P < 0.10, 
and >0.10 from the zero-amplitude tests, respectively. Results for the 0.41 -year component are reproduced 
below, where they are compared with the solar flare index that had been reported by physicists to be 
characterized by an about 5-month (0.41 -year) component. The prominence of the about 5-month 
component in human heart rate follows an about 1 1-year cycle, which is similar to that characterizing solar 
flares. © Halberg Chronobiology Center. 



used to illustrate the great variability in the length of the about 11 -year solar activity 
cycle, Figure 9 [94]. 

Discussion and conclusion 

There are, of course, other important tools for the analysis of time series [95-103]. 
Most of them, however, require that the data be equidistant. This overview focused spe- 
cifically on the use of the cosinor and its different extensions. The method is fairly sim- 
ple and its results lead to meaningful interpretations. Despite its several shortcomings 
related primarily to the difficulty of satisfying all assumptions underlying the use of 
regression techniques, its wide-ranging applications have played an important role in 
the development of chronobiology as a quantitative scientific discipline. Used with cau- 
tion, results based on a combination of exploratory analyses with the different cosinor 
routines and other conventional statistical tests, progress has also been made in the 
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Figure 9 Nonlinear serial section. The period of the about 1 1 -year cycle in solar activity is estimated by 
nonlinear least squares applied to yearly Wolf numbers analyzed in a 35-year interval progressively moved 
by 5 years throughout the time series. The solar activity cycle has a period that can vary from about 9 to 
15 years, shown here with its 95% confidence interval and compared with the official cycle length. 
© Halberg Chronobiology Center. 



field of chronomics which aims at mapping broad time structures from the high- 
frequency brain waves to the multi-decadal cycles characterizing space-terrestrial wea- 
ther influencing human physiology and pathology [3,104]. 

Despite its simplicity, some reluctance remains for some investigators to use the cosi- 
nor for estimating rhythm parameters or for considering more than a single test time 
in designing experiments. Too many studies still rely on testing only at a fixed time of 
day (to control for the circadian variation) or at most at two times 12 hours apart, ig- 
noring the possibility that the two selected timepoints may be at the midline crossings 
rather than at the peak and trough where differences are maximal. As discussed else- 
where, such practice can be misleading in missing an existing difference or even in 
finding a difference in mean when none exists [15-17]. Computing day-night differ- 
ences in lieu of an amplitude and acrophase is also widely done to interpret ambula- 
tory blood pressure monitoring records in terms of "dipping" [105], despite the 
documentation in several outcome studies of the superiority of a chronobiological 
approach [106,107]. 

To some extent, this status quo may be accounted for by the lack of dissemination of 
computer software offering chronobiologists tools for time series analysis applicable to 
non- equidistant data. This situation is slowly changing, however. Personal computers 
have become more powerful and statistical packages have become more readily avail- 
able for relatively easy use by investigators not necessarily versed in all statistical details 
underlying the programs included in the software packages. While professional statis- 
tical software packages remain somewhat expensive for individual users, several open- 
source packages (such as Octave and R) offer an attractive alternative, notably since 
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some are platform-independent, running on PCs, Macs or Linux systems [108]. Pro- 
grammers have taken advantage of the tools available in these packages to write code 
to perform analytical tasks of interest to chronobiologists. Perhaps the most compre- 
hensive package is that developed by Oehlert and Bingham [109], offering a large array of 
procedures that can be applied by writing minimal coding instructions to call the different 
macros. Selected programs used in chronobiology have long been offered on the website 
of Refinetti [110], with clear instructions on how to run the programs. While not open- 
source, the Expert Soft Technologie website [111] also offers an array of cosinor-based 
and other procedures, including techniques for the study of non-stationary signals. These 
programs have been used in the study of shift- workers [112]. 

In summary, selected methods for the study of biologic time series have been reviewed 
and their relative merits have been discussed in the light of underlying assumptions. Some 
illustrative applications have also been mentioned. When the choice of a model is justified, 
and it is functional and explicative, quantitative methods of data analysis are extremely 
valuable to specify the model and obtain estimates of its parameters. Even when under- 
lying assumptions are not fully met, point estimates of the parameters can be very useful. 
More caution is needed, however, in deciding whether P-values and confidence intervals 
are trustworthy, since violation of underlying assumptions tends to yield results that 
are too liberal. Once this limitation is taken into consideration, data analysis methods 
as described herein constitute extremely valuable tools for research in chronobiology 
and chronomics. 
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